Fast Identification of Similarly Expressed Genes

نویسنده

  • Marco Dimas Gubitoso
چکیده

One of the most important problems in modern genomics is to identify which genes are responsible for an specific phenomenon within the cell. This problem can be partially answered by comparing expression data from two sets of related experiments, for instance a set of cancerous and a set of healthy cells, and genes with similar expression level. Unfortunately, the number of genes is usually much larger than the number of available expression data, and to find the best candidates is in general difficult and time consuming. We propose a different and simple representation of the expression data which allows a very quick way to establish of how potentially related a gene is to each subset: instead of representing p microarrays, each containing N genes spotted, as a set of p labelled points in IR , we simply transpose the matrix and and represent the p experiments as N points in IR. In this new representation, the axis are labelled according to the experiment they represent and the IR can be decomposed in a cartesian product of subspaces S0,S1, . . .Sm. A point in this space represents a single gene and its projection on the subspace Si indicates the behavior of that gene in the i class. Consider a gene g and its projection gi on Si. By analyzing the projection and the distance of gi to the diagonal of Si, it is possible to generate several simple and significant metrics to evaluate the influence (or its absence) of g. The Fischer linear separator, for instance, has a straightforward geometric interpretation in this setup. The computation of most metrics takes linear time in N . A simple sort can then be used to identify the 1The line generated by the vector (1, 1, . . . , 1) most and least significant ones, with an execution time extremely short (less than a second with 4000 genes and 50 samples on a Pentium III). The method does not take into account groups of genes, but can be very useful for discarding uninteresting or pinpoint the most significant genes before applying more sophisticated techniques. Some interaction among genes can be modeled in the metric definition and in the sort criteria. For p ≤ 3, representing the genes only by their projections in the diagonals of S∗ is a very interesting way to visualize the whole dataset. Experimental results using known data indicate that the selection made by this method has a large intersection with that made by much more expensive algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and Expression Analysis of Two Arabidopsis LRR-Protein Encoding Genes Responsive to Some Abiotic Stresses

AbstractTwo Arabidopsis thaliana genes, psr9.2 and psr9.4 appearedto be highly similar to a phosphate-starved induced gene,psr9, isolated from Brassica nigra suspension cells.Sequence analysis classified the encoded polypeptides asmembers of leucine-rich repeat (LRR) proteins superfamily.The sequence of psr9 proteins comprise a unique N-terminalregion e...

متن کامل

Identification and Expression of Genes Involved in the Biosynthesis of Penicillin and Its Detection by HPLC in Penicillium chrysogenum

In this study, after identification of genes involved in biosynthesis of penicillin, we evaluated the expression of pcbAB and pcbC genes in P.  Chrysogenum.  A quantitative PCR (qPCR) approach was used to determine how these genes were expressed in different time courses. In addition, the produced penicillin content was measured using HPLC. qPCR analysis of mRNAs extracted from P. chrysogenum i...

متن کامل

Identification of key genes and pathways involved in vitiligo vulgaris by gene network analysis

Background and Aim: Vitiligo vulgaris is an acquired, chronic skin and hair condition characterized clinically by loss of melanin, which, if untreated, is typically progressive and irreversible. The aim of the present study was to identify potential genes involved in the pathogenesis of vitiligo. Methods: One dataset of mRNA expression in patients with vitiligo (GSE65127) were obtained from ...

متن کامل

Identification and Functional Prediction of Long Non-Coding RNAs Responsive to Drought stress in Lens culinaris L.

Drought stress is one of the main environmental factors that affects growth and productivity of crop plants, including lentil. In the course of evolution evolution, crucial genetic regulations mediated by non-coding RNAs (ncRNAs) have emerged in plant in response to drought and other abiotic stresses. In the present study, after identifying lncRNAs within the expression profile of lentil, RNA-s...

متن کامل

Identification of novel genes expressed in Brassica napus during leaf senescence and in response to oxidative stress

Senescence is a genetically regulated oxidative process that involves a general degradation of cellular structures and enzymes and the mobilization of the products of degradation to other parts of the plant. The cDNA-AFLP (cDNA-Amplified Fragment Length Polymorphism) analysis has been used under stringent PCR conditions afforded by ligation of adapters to restriction fragments, and the use of s...

متن کامل

In silico identification of miRNAs and their target genes and analysis of gene co-expression network in saffron (Crocus sativus L.) stigma

As an aromatic and colorful plant of substantive taste, saffron (Crocus sativus L.) owes such properties of matter to growing class of the secondary metabolites derived from the carotenoids, apocarotenoids. Regarding the critical role of microRNAs in secondary metabolic synthesis and the limited number of identified miRNAs in C. sativus, on the other hand, one may see the point how the characte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003